Neural network data processing apparatus and method

ABSTRACT

Embodiments of the invention relates to a data processing apparatus comprising a processor configured to provide a neural network, wherein the neural network comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of input data values of the array of input data values. Moreover, embodiments of the invention relates to a corresponding data processing method.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/EP2017/057089, filed on Mar. 24, 2017, the disclosure of which ishereby incorporated by reference in its entirety.

TECHNICAL FIELD

Generally, embodiments of the invention relate to the field of machinelearning or deep learning based on neural networks. Embodiments of theinvention relate to a neural network data processing apparatus andmethod, in particular for processing data in the fields of audioprocessing, computer vision, image or video processing, classification,detection and/or recognition.

BACKGROUND

Guided up-scaling, which is commonly used in many signal processingapplications, including especially image up-scaling methods for imagequality improvement, super-resolution and many others [Kaiming He, JianSun, Xiaoou Tang, “Guided Image Filtering”, ECCV 2010], is a process inwhich input data is being combined with additional input in form ofup-scaling weights that control the influence of each input data valueon the result to form the output data.

In deep-learning, a common approach recently used in many applicationfields is the utilization of convolutional neural networks (CNNs).Generally, a part of such convolutional neural networks is at least oneconvolution (or convolutional) layer which performs a convolution ofinput data values with a learned kernel K producing one output datavalue per convolution kernel for each output position [J. Long, E.Shelhamer, T. Darrell, “Fully Convolutional Networks for SemanticSegmentation”, CVPR 2015]. For the two-dimensional case used, forinstance, in image processing the convolution using the learned kernel Kcan be expressed mathematically as follows:

out(x,y)==Σ_(i=−r) ^(r)Σ_(j=−r) ^(r)in(x−i,y−j)·K(i,j)[+B],

wherein out(x,y) denotes the array of output data values, in(x−i,y−j)denotes a sub-array of input data values and K(i,j) denotes the kernelcomprising an array of kernel weights or kernel values of size(2r+1)×(2r+1). B denotes a learned bias term, which can be added forobtaining each output data value. The weights of the kernel K are thesame for the whole array of input data values in(x,y) and are generallylearned during a learning phase of the neural network which, in case of1st order methods, consists of iteratively back-propagating thegradients of the neural network output back to the input layers andupdating the weights of all the network layers by a partial derivativecomputed in this way. An extension of CNNs are deconvolutional neuralnetworks (DNNs) with an element that extends their functionalityrelative to CNNs that is called deconvolution. Deconvolution can beinterpreted as an “inversed” convolution known from classical CNNs.

SUMMARY

It is an object of the invention to provide an improved data processingapparatus and method based on neural networks.

The foregoing and other objects are achieved by the subject matter ofthe independent claims. Further embodiments are apparent from thedependent claims, the description and the figures.

Generally, embodiments of the invention provide a new approach fordeconvolution or upscaling of data for neural networks that isimplemented into a neural network as a new type of neural network layer.The neural network layer can compute up-scaled data using individualup-scaling weights that are learned for each individual spatialposition. Up-scaling weights can be computed as a function of positiondependent weights or similarity features and position independentlearned weight kernels, resulting in individual up-scaling weights foreach input spatial position. In this way a variety of sophisticatedposition dependent or position adaptive kernels learned by the neuralnetwork can be utilized for better adaptation of the up-scaling weightsto the input data.

, A first aspect of the invention relates to a data processing apparatuscomprising one or more processors configured to provide a neuralnetwork. The data to be processed by the data processing apparatus canbe, for instance, two-dimensional image or video data or one-dimensionalaudio data.

The neural network provided by the one or more processors of the dataprocessing apparatus comprises a neural network layer being configuredto process an array of input data values, such as a two-dimensionalarray of input data values in(x,y), into an array of output data values,such as a two-dimensional array of output data values out (x,y). Theneural network layer can be a first layer or an intermediate layer ofthe neural network.

The array of input data values can be one-dimensional (i.e. a vector,e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. amatrix, e.g. an image or other temporal or spatial sequence), orN-dimensional (e.g. any kind of N-dimensional feature array, e.g.provided by a conventional pre-processing or feature extraction and/orby other layers of the neural network).

The array of input data values can have one or more channels, e.g. foran RGB image one R-channel, one G-channel and one B-channel, or for ablack/white image only one grey-scale or intensity channel. The term“channel” can refer to any “feature”, e.g. features obtained fromconventional pre-processing or feature extraction or from other neuralnetworks or neural network layers of the same neural network. The arrayof input data values can comprise, for instance, two-dimensional RGB orgrey scale image or video data representing at least a part of an image,or a one-dimensional audio signal. In case the neural network layer isimplemented as an intermediate layer of the neural network, the array ofinput data values can be, for instance, an array of similarity featuresgenerated by previous layers of the neural network on the basis of aninitial, i.e. original array of input data values, e.g. by means of afeature extraction.

The neural network layer is configured to generate from the array ofinput data values the array of output data values on the basis of aplurality of position dependent, i.e. spatially variable kernels and aplurality of different input data values of the array of input datavalues. Each kernel comprises a plurality of kernel values (alsoreferred to as kernel weights). For a respective position or element ofthe array of input data values a respective kernel is applied theretofor generating a respective sub-array of the array of output datavalues. In one embodiment, the plurality of kernel values of arespective position dependent kernel can be respectively multiplied witha respective input data value for generating a respective sub-array ofthe array of output data values having the same size as the positiondependent kernel, i.e. the array of kernel values. Generally, the sizeof the array of input data values can be smaller than the size of thearray of output data values.

A “position dependent kernel” as used herein means a kernel whose kernelvalues can depend on the respective position or element of the array ofinput data values. In other words, for a first kernel used for a firstinput data value of the array of input data values the kernel values candiffer from the kernel values of a second kernel used for a second inputdata value of the array of input data values. In a two-dimensional arraythe position could be a spatial position defined, for instance, by twospatial coordinates x, y. In a one-dimensional array the position couldbe a temporal position defined, for instance, by a time coordinate t.

Thus, an improved data processing apparatus based on neural networks isprovided. The data processing apparatus allows upscaling or deconvolvingthe input data in a way that can better reflect mutual data similarity.Moreover, the data processing apparatus allows adapting the kernelweights for different spatial positions of the array of input datavalues. This, in turn, allows, for instance, minimizing the influence ofsome of the input data values on the result, for instance the input datavalues that are associated with another part of the scene (as determinedby semantic segmentation) or a different object that is being analysed.

In a further embodiment of the first aspect, the neural networkcomprises at least one additional network layer configured to generatethe plurality of position dependent kernels on the basis of an originalarray of original input values of the neural network, wherein theoriginal array of original input values of the neural network comprisesthe array of input values or another array of input values associated tothe array of input values. The original array of original input valuescan be the array of input data values or a different array.

In a further embodiment of the first aspect, the neural network isconfigured to generate the plurality of position dependent kernels basedon a plurality of learned position independent kernels and a pluralityof position dependent weights (also referred to as similarity features).Generally, the position independent kernels can be learned by the neuralnetwork and the position dependent weights (i.e. similarity features)can be computed, for instance, by a further preceding layer of theneural network. This embodiment allows minimizing the amount of databeing transferred to the neural network layer in order to obtain thekernel values. This is because the kernel values are not transferreddirectly, but computed from the plurality of position dependent weights(i.e. similarity features) substantially reducing the amount of data foreach element of the array of output data values. This can minimize theamount of data being stored and transferred by the neural networkbetween the different network layers, which is especially importantduring the learning process on the basis of the mini-batch approach asthe memory of the data processing apparatus (GPU) is currently the mainbottleneck. Moreover, this embodiment allows for a better adaption ofthe kernel values to the processed data and utilizing more sophisticatedsimilarity features. For instance, information about object shapes orobject segmentations can be utilized in order to better preserve betterobject boundaries or even increase the level of details in thehigher-resolution output. In this way, information about some smalldetails from the original array of original input values not present inthe possibly low-resolution array of input data values can be combinedwith the array of input data values in order to create higher-resolutionarray of output data values.

In a further embodiment of the first aspect, the neural network isconfigured to generate a kernel of the plurality of position dependentkernels by adding the learned position independent kernels each weightedby the associated non-learned position dependent weights (i.e.similarity features). This embodiment provides a very efficientrepresentation of the plurality of position dependent kernels using alinear combination of position independent “base kernels”.

In a further embodiment of the first aspect, the plurality of positionindependent kernels are predetermined or learned, and wherein the neuralnetwork comprises at least one additional neural network layer or“conventional” pre-processing layer configured to generate the pluralityof position dependent weights (i.e. similarity features) based on anoriginal array of original input values of the neural network, whereinthe original array of original input values of the neural networkcomprises the array of input values or another array of input valuesassociated to the array of input values. The original array of originalinput values can be the array of input data values or a different array.In an embodiment, the at least one additional neural network layer or“conventional” pre-processing layer can generate the plurality ofposition dependent weights (i.e. similarity features) using, forinstance, bilateral filtering, semantic segmentation, per-instanceobject detection, and data importance indicators like ROI (region ofinterest).

In a further embodiment of the first aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,and the convolutional neural network layer is configured to generate theplurality of position dependent kernels w_(L)(x,y,i,j) on the basis ofthe following equation:

w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j),

wherein F_(f)(x,y) denotes the plurality of N_(f) position dependentweights (i.e. similarity features) and K_(f)(i,j) denotes the pluralityof position independent “base” kernels.

In a further embodiment of the first aspect, the neural network layer isa deconvolutional network layer or an upscaling network layer.

In a further embodiment of the first aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,wherein the neural network layer is a deconvolution network layerconfigured to generate the array of output data values on the basis ofthe following equations:

${{{out}\mspace{11mu} \left( {x,y,c_{o}} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y,c_{o}} \right)}{\sum_{c_{i} = 1}^{C_{i}}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime},c_{i}} \right){w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}}}},{{W_{L}^{\prime}\left( {x,y,c_{o}} \right)} = {\sum_{c_{i} = 1}^{C_{i}}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},$

wherein x,y,x′,y′,i,j denote array indices, out(x,y,c_(o)) denotes themulti-channel array of output data values, in(x′,y′,c_(i)) denotes thearray of input data values, r denotes a size of each kernel of theplurality of position dependent multi-channel kernelsw_(L)(x′,y′,c_(o),c_(i),i,j) and W_(L)′(x,y,c_(o)) denotes anormalization factor. In an embodiment, the normalization factorW_(L)′(x,y,c_(o)) can be set equal to 1.

In a further embodiment of the first aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,wherein the neural network layer is an upscaling network layerconfigured to generate the array of output data values on the basis ofthe following equations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime},c_{i}} \right){w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},$

wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the arrayof output data values, in(x′,y′) denotes the array of input data values,r denotes a size of each kernel of the plurality of position dependentkernels w_(L)(x′,y′,i,j) and W_(L)′(x,y) denotes a normalization factor.In an embodiment, the normalization factor W_(L)′(x,y) can be set equalto 1. As will be appreciated, the sum in the equation above extends overevery possible position (x′,y′) of the array of input data values, wherex′ and y′ meet the conditions: x′−i=x and y′−j=y. In this way,overlapping positions of different position dependent kernels areobtained that are summed to generate the final output data valueout(x,y).

In a further embodiment of the first aspect, the array of input datavalues and the array of output data values are two-dimensional arraysand the neural network layer is configured to generate the array ofoutput data values on the basis of the following equations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},\mspace{79mu} {{{sel}\mspace{11mu} \left( {x,y,i,j} \right)} = \left\{ \begin{matrix}{1,} & {{w_{L}\left( {x,y,i,j} \right)}\mspace{14mu} {is}\mspace{14mu} \max \mspace{14mu} {or}\mspace{14mu} \min \mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}} \\\; & {{w_{L}\left( {x,y,k,l} \right)},{k \in \left\{ {{- r},\ldots \;,r} \right\}},{l \in \left\{ {r,\ldots \;,r} \right\}}} \\{0,} & {otherwise}\end{matrix} \right.}$

wherein x,y,x′,y′i,j,k,l denote array indices, out(x,y) denotes thearray of output data values, in(x′,y′) denotes the array of input datavalues, r denotes a size of each kernel of the plurality of positiondependent kernels w_(L)(x,y,i,j), sel(x,y,i,j) denotes a selectionfunction and W_(L)′(x,y) denotes a normalization factor. In anembodiment, the normalization factor W_(L)′(x,y) can be set equal to 1.

In a further embodiment of the first aspect, the array of input datavalues and the array of output data values are two-dimensional arraysand the neural network layer is configured to generate the array ofoutput data values on the basis of the following equations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},\mspace{79mu} {{{sel}\mspace{11mu} \left( {x,y,x^{\prime},y^{\prime},i,j} \right)} = \left\{ \begin{matrix}{1,} & {{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {is}\mspace{14mu} {maximum}\mspace{14mu} {w{eight}}\mspace{14mu} {of}} \\\; & {{{all}\mspace{14mu} {w_{L}\left( {x^{''},y^{''},k,l} \right)}},{\left\{ {x^{''},y^{''}} \right\}:{x^{''} -}}} \\\; & {{k \in \left\{ {{- r},\ldots \;,r} \right\}},{l \in \left\{ {r,\ldots \;,r} \right\}}} \\{0,} & {otherwise}\end{matrix} \right.}$

wherein x,y,x′,y′,x″,y″,i,j,k,l denote array indices, out(x,y) denotesthe array of output data values, in(x′,y′) denotes the array of inputdata values, r denotes a size of each kernel of the plurality ofposition dependent kernels w_(L)(x′,y′,i,j), sel(x,y,x′,y′,i,j) denotesa selection function and W_(L)′(x,y) denotes a normalization factor. Inan embodiment, the normalization factor W_(L)′(x,y) can be set equal to1.

According to a second aspect, the invention relates to a correspondingdata processing method comprising the operation of generating by aneural network layer of a neural network from an array of input datavalues an array of output data values based on a plurality of positiondependent kernels and a plurality of different input data values of thearray of input data values.

In a further embodiment of the second aspect, the method comprises thefurther operation of generating the plurality of position dependentkernels by an additional neural network layer of the neural networkbased on an original array of original input values of the neuralnetwork, wherein the original array of original input values of theneural network comprises the array of input values or another array ofinput values associated to the array of input values.

In a further embodiment of the second aspect, the operation ofgenerating the plurality of position dependent kernels comprisesgenerating the plurality of position dependent kernels based on aplurality of position independent kernels and a plurality of positiondependent weights.

In a further embodiment of the second aspect, the operation ofgenerating the plurality of position dependent kernels comprises theoperation of adding, i.e. summing the position independent kernelsweighted by the associated position dependent weights.

In a further embodiment of the second aspect, the plurality of positionindependent kernels are predetermined or learned and the operation ofgenerating the plurality of position dependent weights comprises theoperation of generating the plurality of position dependent weights byan additional neural network layer or a processing layer of the neuralnetwork based on an original array of original input values of theneural network, wherein the original array of original input values ofthe neural network comprises the array of input values or another arrayof input values associated to the array of input values.

In a further embodiment of the second aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,and the operation of generating a kernel of the plurality of positiondependent kernels w_(L)(x,y,i,j) is based on the following equation:

w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j),

wherein F_(f)(x,y) denotes the plurality of N_(f) position dependentweights (i.e. similarity features) and K_(f)(i,j) denotes the pluralityof position independent kernels.

In a further embodiment of the second aspect, the neural network layeris a deconvolutional network layer or an upscaling network layer.

In a further embodiment of the second aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,wherein the neural network layer is a deconvolution network layer andthe operation of generating the array of output data values comprisesgenerating the array of output data values on the basis of the followingequations:

${{{out}\mspace{11mu} \left( {x,y,c_{o}} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y,c_{o}} \right)}{\sum_{c_{i} = 1}^{C_{i}}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime},c_{i}} \right)w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y,c_{o}} \right)} = {\sum_{c_{i} = 1}^{C_{i}}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},$

wherein x,y,x′,y′,i,j denote array indices, out(x,y,c_(o)) denotes themulti-channel array of output data values, in(x′,y′,c_(i)) denotes thearray of input data values, r denotes a size of each kernel of theplurality of position dependent multi-channel kernelsw_(L)(x′,y′,c_(o),c_(i),i,j) and W_(L)′(x,y,c_(o)) denotes anormalization factor. In one embodiment, the normalization factorW_(L)′(x,y,c_(o)) can be set equal to 1.

In a further embodiment of the second aspect, the array of input datavalues and the array of output data values are two-dimensional arrays,wherein the neural network layer is an upscaling network layer and theoperation of generating the array of output data values comprisesgenerating the array of output data values on the basis of the followingequations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{i = {- r}}^{r}{\sum_{j = {- r}}^{r}{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}}},{{\left\{ {x^{\prime}y^{\prime}} \right\}:{x^{\prime} - i}} = {\quad{x,{\quad{{{y^{\prime} - j} = y}\;,{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},}}}}}$

wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the arrayof output data values, in(x′,y′) denotes the array of input data values,r denotes a size of each kernel of the plurality of position dependentkernels w_(L)(x′,y′,i,j) and W_(L)′(x,y) denotes a normalization factor.In an embodiment the normalization factor W_(L)′(x,y) can be set equalto 1.

In a further embodiment of the second aspect, the array of input datavalues and the array of output data values are two-dimensional arraysand the operation of generating the array of output data valuescomprises generating the array of output data values on the basis of thefollowing equations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},\mspace{79mu} {{{sel}\mspace{11mu} \left( {x,y,i,j} \right)} = \left\{ \begin{matrix}{1,} & {{w_{L}\left( {x,y,i,j} \right)}\mspace{14mu} {is}\mspace{14mu} \max \mspace{14mu} {or}\mspace{14mu} \min \mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}} \\\; & {{w_{L}\left( {x,y,k,l} \right)},{k \in \left\{ {{- r},\ldots \;,r} \right\}},{l \in \left\{ {r,\ldots \;,r} \right\}}} \\{0,} & {otherwise}\end{matrix} \right.}$

wherein x,y,x′,y′,i,j,k,l denote array indices, out(x,y) denotes thearray of output data values, in(x′,y′) denotes the array of input datavalues, r denotes a size of each kernel of the plurality of positiondependent kernels w_(L)(x,y,i,j), sel(x,y,i,j) denotes a selectionfunction and W_(L)′(x,y) denotes a normalization factor. In anembodiment the normalization factor W_(L)′(x,y) can be set equal to 1.

In a further embodiment of the second aspect, the array of input datavalues and the array of output data values are two-dimensional arraysand the operation of generating the array of output data valuescomprises generating the array of output data values on the basis of thefollowing equations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},\mspace{79mu} {{{sel}\mspace{11mu} \left( {x,y,x^{\prime},y^{\prime},i,j} \right)} = \left\{ \begin{matrix}1 & {{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {is}\mspace{14mu} {maximum}\mspace{14mu} {w{eight}}\mspace{14mu} {of}} \\\; & {{{all}\mspace{14mu} {w_{L}\left( {x^{''},y^{''},k,l} \right)}},{\left\{ {x^{''},y^{''}} \right\}:{x^{''} -}}} \\\; & {{k = x},{{y^{''} - l} = y},} \\\; & {{k \in \left\{ {{- r},\ldots \;,r} \right\}},{l \in \left\{ {r,\ldots \;,r} \right\}}} \\{0,} & {otherwise}\end{matrix} \right.}$

wherein x,y,x′, y′,x″,y″,i,j,k,l denote array indices, out(x,y) denotesthe array of output data values, in(x′,y′) denotes the array of inputdata values, r denotes a size of each kernel of the plurality ofposition dependent kernels w_(L)(x′,y′,i,j), sel(x,y,x′,y′,i,j) denotesa selection function and W_(L)′(x,y) denotes a normalization factor. Inan embodiment the normalization factor W_(L)′(x,y) can be set equal to1.

According to a third aspect the invention relates to a computer programcomprising program code for performing the method according to thesecond aspect, when executed on a processor or a computer.

The invention can be implemented in hardware and/or software.

BRIEF DESCRIPTION OF THE DRAWINGS

Further embodiments of the invention will be described with respect tothe following figures, wherein:

FIG. 1 shows a schematic diagram illustrating a data processingapparatus based on a neural network according to an embodiment;

FIG. 2 shows a schematic diagram illustrating a neural network providedby a data processing apparatus according to an embodiment;

FIG. 3 shows a schematic diagram illustrating the concept of up-scalingof data implemented in a data processing apparatus according to anembodiment;

FIG. 4 shows a schematic diagram illustrating an up-scaling operationprovided by a neural network of a data processing apparatus according toan embodiment;

FIG. 5 shows a schematic diagram illustrating different aspects of aneural network provided by a data processing apparatus according to anembodiment;

FIG. 6 shows a schematic diagram illustrating different aspects of aneural network provided by a data processing apparatus according to anembodiment;

FIG. 7 shows a schematic diagram illustrating different processingoperation s of a data processing apparatus according to an embodiment;

FIG. 8 shows a schematic diagram illustrating a neural network providedby a data processing apparatus according to an embodiment;

FIG. 9 shows a schematic diagram illustrating different aspects of aneural network provided by a data processing apparatus according to anembodiment;

FIG. 10 shows a schematic diagram illustrating different processingoperations of a data processing apparatus according to an embodiment;and

FIG. 11 shows a flow diagram illustrating a neural network dataprocessing method according to an embodiment.

In the various figures, identical reference signs will be used foridentical or at least functionally equivalent features.

DETAILED DESCRIPTION OF EMBODIMENTS

In the following description, reference is made to the accompanyingdrawings, which form part of the disclosure, and in which are shown, byway of illustration, aspects in which the embodiments of the inventionmay be placed. It is understood that other aspects may be utilized andstructural or logical changes may be made without departing from thescope of the embodiments of the invention. The following detaileddescription, therefore, is not to be taken in a limiting sense, as thescope of the embodiments of the invention is defined by the appendedclaims.

For instance, it is understood that a disclosure in connection with adescribed method may also hold true for a corresponding device or systemconfigured to perform the method and vice versa. For example, if amethod operation is described, a corresponding device may include a unitto perform the described method operation, even if such unit is notexplicitly described or illustrated in the figures. Further, it isunderstood that the features of the various exemplary aspects describedherein may be combined with each other, unless noted otherwise.

FIG. 1 shows a schematic diagram illustrating a data processingapparatus 100 according to an embodiment configured to process data onthe basis of a neural network. To this end, the data processingapparatus 100 shown in FIG. 1 comprises a processor 101. In anembodiment, the data processing apparatus 100 can be implemented as adistributed data processing apparatus 100 comprising more than the oneprocessor 101 shown in FIG. 1.

The processor 101 of the data processing apparatus 100 is configured toprovide a neural network 110. As will be described in more detailfurther below, the neural network 110 comprises a neural network layerbeing configured to generate from an array of input data values an arrayof output data values based on a plurality of position dependent kernelsand a plurality of different input data values of the array of inputdata values. As shown in FIG. 1, the data processing apparatus 100 canfurther comprise a memory 103 for storing and/or retrieving the inputdata values, the output data values and/or the kernels.

Each kernel comprises a plurality of kernel values (also referred to askernel weights). For a respective position or element of the array ofinput data values a respective kernel is applied thereto for generatinga respective sub-array of the array of output data values. Generally,the size of the array of input data values is smaller than the size ofthe array of output data values. A “position dependent kernel” as usedherein means a kernel whose kernel values depend on the respectiveposition or element of the array of input data values. In other words,for a first kernel used for a first input data value of the array ofinput data values the kernel values can differ from the kernel values ofa second kernel used for a second input data value of the array of inputdata values. In a two-dimensional array the position could be a spatialposition defined, for instance, by two spatial coordinates x, y. In aone-dimensional array the position could be a temporal position defined,for instance, by a time coordinate t.

The array of input data values can be one-dimensional (i.e. a vector,e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. amatrix, e.g. an image or other temporal or spatial sequence), orN-dimensional (e.g. any kind of N-dimensional feature array, e.g.provided by a conventional pre-processing or feature extraction and/orby other layers of the neural network 110). The array of input datavalues can have one or more channels, e.g. for an RGB image oneR-channel, one G-channel and one B-channel, or for a black/white imageonly one grey-scale or intensity channel. The term “channel” can referto any “feature”, e.g. features obtained from conventionalpre-processing or feature extraction or from other neural networks orneural network layers of the neural network 110. The array of input datavalues can comprise, for instance, two-dimensional RGB or grey scaleimage or video data representing at least a part of an image, or aone-dimensional audio signal. In case the neural network layer 120 isimplemented as an intermediate layer of the neural network 110, thearray of input data values can be, for instance, an array of similarityfeatures generated by previous layers of the neural network on the basisof an initial, i.e. original array of input data values, e.g. by meansof a feature extraction, as will be described in more detail furtherbelow.

As will be described in more detail below, the neural network layer 120can be implemented as an up-scaling layer 120 configured to process eachchannel of the array of input data values separately, e.g. for an inputarray of R-values one (scalar) R-output value is generated. The positiondependent kernels may be channel-specific or common for all channels.Moreover, the neural network layer 120 can be implemented as adeconvolution (or deconvolutional) layer configured to “mix” allchannels of the array of input data values. For instance, in case thegenerated array of output data values is an RGB image, i.e. amulti-channel array, every single channel of a multi-channel input dataarray is used to generate all three channels of the multi-channel arrayof output data values. The position dependent kernels may bechannel-specific, i.e. multi-channel arrays, or common for all channels.

FIG. 2 shows a schematic diagram illustrating elements of the neuralnetwork 110 provided by the data processing apparatus 100 according toan embodiment. In the embodiment shown in FIG. 2, the neural networklayer 120 is implemented as an up-scaling layer 120. In a furtherembodiment, the neural network layer 120 can be implemented as adeconvolution layer 120 (also referred to as deconvolutional layer 120),as will be described in more detail further below. As indicated in FIG.2, in this embodiment the up-scaling layer 120 is configured to generatea two-dimensional array of output data values out(x,y) 121 on the basisof the two-dimensional array of input data values in(x,y) 117 and theplurality of position dependent kernels 118 comprising a plurality ofkernel values or kernel weights.

In an embodiment, the up-scaling layer 120 of the neural network 110shown in FIG. 2 is configured to generate the array of output datavalues out(x,y) 121 on the basis of the array of input data valuesin(x,y) 117 and the plurality of position dependent kernels 118comprising the kernel values w_(L)(x,y,i,j) using the followingequations:

${{{out}\mspace{11mu} \left( {x,y,} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{in}\mspace{11mu} \left( {x^{\prime},y^{\prime}} \right){w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L}^{\prime}\left( {x,y} \right)} = {\sum_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}},{i \in \left\{ {{- r},\ldots \;,r} \right\}},{j \in \left\{ {{- r},\ldots \;,r} \right\}},$

wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the arrayof output data values 121, in(x′,y′) denotes the array of input datavalues 117, r denotes a size of each kernel of the plurality of positiondependent kernels w_(L)(x′,y′,i,j) 118 (in this example, each kernel has(2r+1)*(2r+1) kernel values) and W_(L)′(x,y) denotes a normalizationfactor and can be set to 1. As will be appreciated, the sum in theequation above extends over every possible position (x′,y′) of the arrayof input data values 117, where x′ and y′ meet the conditions: x′−i=xand y′−j=y. In this way, overlapping positions of different positiondependent kernels 118 are obtained that are summed to generate the finaloutput data value out(x,y).

In other embodiments, the normalization factor can be omitted, i.e. setto one. For instance, in case the neural network layer 120 isimplemented as a deconvolutional network layer the normalization factorcan be omitted. For upscaling the normalization factor allows to keepthe DC component. This is usually not required in the case of thedeconvolutional network layer 120.

As will be appreciated, the above equations for a two-dimensional inputarray and a kernel having a quadratic shape can be easily adapted to thecase of an array of input values 117 having one dimension or more thantwo dimensions and/or a kernel having a rectangular shape, i.e.different horizontal and vertical dimensions.

For an embodiment, where the neural network layer 120 is implemented asa deconvolution layer and the array of input data values in(x,y,c_(i))117 is a two-dimensional array of input data values the deconvolutionallayer 120 is configured to generate the array of output data values 121as a multi-channel array of output data values out(x,y,c_(o)) 117, anarray having more than one channel c_(o). In this case, also theplurality of position dependent kernels 118 will have the correspondingnumber of channels, wherein each multi-channel position dependent kernelcomprises the kernel values w_(L)(x′,y′,c_(o),c_(i),i,j). For instance,the deconvolutional layer 120 could be configured to deconvolve amonochromatic image into an RGB image with higher resolution using aplurality of position dependent kernels 118 having three channels.

In an embodiment, the deconvolutional layer 120 is configured togenerate the multi-channel array of output data values out(x,y,c_(o))121 on the basis of the array of input data values in(x,y,c_(i)) 117having one or more channels and the plurality of multi-channel positiondependent kernels 118 comprising the kernel valuesw_(L)(x′,y′,c_(o),c_(i),i,j) using the following equations:

${{{out}\left( {x,y,c_{o}} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y,c_{o}} \right)}{\sum\limits_{c_{i} = 1}^{C_{i}}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime},c_{i}} \right)}{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}}}},\mspace{20mu} {{W_{L}^{\prime}\left( {x,y,c_{o}} \right)} = {\sum\limits_{c_{i} = 1}^{C_{i}}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}},\mspace{20mu} {i \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},$

wherein x,y,x′,y′,i,j denote array indices, r denotes a size of eachkernel of the plurality of position dependent kernels 118 andW_(L)′(x,y,c_(o)) denotes a normalization factor. In other embodiments,the normalization factor can be omitted, i.e. set to one.

In an embodiment, the neural network layer 120 is configured to generatethe array of output data values 121 with a larger size than the array ofinput data values 117. In other words, in an embodiment, the neuralnetwork 110 is configured to perform an up-step or upscaling operationof the array of input data values 117 on the basis of the plurality ofposition dependent kernels 118. FIG. 3 illustrates an up-step orupscaling operation provided by a neural network 110 of the dataprocessing apparatus 100 according to an embodiment. Using an up-step orupscaling operation allows increasing the receptive field, enablesprocessing the data with a cascade of smaller filters as compared with asingle layer with a kernel covering an equal receptive field, and alsoenables the neural network 110 to better analyse the data by findingmore sophisticated relationships among the data.

In the up-step or upscaling operation illustrated in FIG. 3 the neuralnetwork layer 120 can up-scale the input data produced by a precedingcascade of down-layers for generating an array of output data valueshaving an increased resolution. This upscaling operation can beperformed by deconvolving every channel of each spatial position of thearray of input data values with position dependent kernels with a strideS greater than 1, producing a data volume of increased resolution. Thestride S specifies the spacing between neighboring input spatialpositions for which deconvolutions are computed. If the stride S isequal to 1, the deconvolution is performed for each spatial position. Ifthe stride S is an integer greater than 1, deconvolution is performedfor every S spatial position, increasing the output resolution by afactor of S for each spatial dimension.

In the exemplary embodiment shown in FIG. 3, the neural network layer120 up-scales every element of the array of input data values 117 into arespective sub-array of the array of output data values 121 with a sizeof (2r+1)×(2r+1) (defined by the size of the position dependent kernels118). In this way, the input data values 117 can be up-scaled to thehigher resolution array of output data values 121.

According to an embodiment, the upscaling operation performed by theneural network layer 120 for the exemplary case of two-dimensional inputand output arrays 117, 121 comprises multiplying a respective input datavalue of the array of input data values 117 with the plurality of kernelweights w_(L)(x,y,i,j) of a respective position dependent kernel 118. Incase the respective position dependent kernel 118 has an exemplary sizeof (2r+1)×(2r+1) this operation will generate a sub-array of the arrayof output data values 121 (which can also be considered as aninterpolation area) having also a size of (2r+1)×(2r+1). As will beappreciated, depending on the selected stride S, the interpolation areasof neighboring input data values may overlap. In order to handle suchcase, according to an embodiment, the values from all overlappinginterpolation areas 122 located at the spatial position (x,y) (i.e.overlapping spatial position) can be aggregated and (optionally)normalized by a normalization factor producing the final output datavalue out(x,y). This operation is illustrated in FIG. 4 for theexemplary case of having R sub-arrays or interpolation areas at thespatial position (x,y).

In the embodiment shown in FIG. 2, the neural network 110 comprises oneor more preceding layers 115 preceding the neural network layer 120 andone or more following layers 125 following the neural network layer 120.In an embodiment, the neural network layer 120 could be the first and/orthe last data processing layer of the neural network 110, i.e. in anembodiment there could be no preceding layers 115 and/or no followinglayers 125.

In an embodiment, the one or more preceding layers 115 can be furtherneural network layers, such as a convolutional network layer, and/or“conventional” pre-processing layers, such as a feature extractionlayer. Likewise, in an embodiment, the one or more following layers 125can be further neural network layers and/or “conventional”post-processing layers.

As shown in the embodiment shown in FIG. 2, one or more of the precedinglayers 115 can be configured to provide, i.e. to generate the pluralityof position dependent kernels 118. In an embodiment, the one or morelayers of the preceding layers 115 can generate the plurality ofposition dependent kernels 118 on the basis of an original array oforiginal input data values. As indicated in FIG. 2, in an embodiment,the original array of original input data values can be an array ofinput data 111 being the original input of the neural network 110. Inanother embodiment, the one or more preceding layers 115 could beconfigured to generate just the plurality of position dependent kernels118 on the basis of the original input data 111 of the neural network110 and to provide the original input data 111 of the neural network 110as the array of input data values 117 to the neural network layer 120.

As indicated in FIG. 2, in a further embodiment, the one or morepreceding layers 115 of the neural network 110 are configured togenerate the plurality of position dependent kernels 118 on the basis ofan array of guiding data 113. A more detailed view of the processingoperations of the neural network 110 of the data processing apparatus100 according to such an embodiment is shown in FIG. 5 for the exemplarycase of two-dimensional input and output arrays. The array of guidingdata 113 is used by the one or more preceding layers 115 of the neuralnetwork 110 to generate the plurality of position dependent kernelsw_(L)(x,y) 118 on the basis of the array of guiding data g(x,y) 113. Asalready described in the context of FIG. 2, the neural network layer 120is configured to generate the two-dimensional array of output datavalues out(x,y) 121 on the basis of the two-dimensional array of inputdata values in(x,y) 117 and the plurality of position dependent kernelsw_(L)(x,y) 118, which, in turn, are based on the array of guiding datag(x,y) 113.

In an embodiment, the one or more preceding layers 115 of the neuralnetwork 110 are neural network layers configured to learn the pluralityof position dependent kernels w_(L)(x,y) 118 on the basis of the arrayof guiding data g(x,y) 113. In another embodiment, the one or morepreceding layers 115 of the neural network 110 are pre-processing layersconfigured to generate the plurality of position dependent kernelsw_(L)(x,y) 118 on the basis of the array of guiding data 113 using oneor more pre-processing schemes, such as feature extraction.

In an embodiment, the one or more preceding layers 115 of the neuralnetwork 110 are configured to generate the plurality of positiondependent kernels w_(L)(x,y) 118 on the basis of the array of guidingdata g(x,y) 113 in a way analogous to up-scaling based on bilateralfilters, as illustrated in FIG. 6. In image processing, a commonapproach to perform data up-scaling is to use bilateral filter weights[M. Elad, “On the origin of bilateral filter and ways to improve it”,IEEE Transactions on Image Processing, vol. 11, no. 10, pp. 1141-1151,October 2002] as a sort of guiding information for interpolating theinput data. The usage of bilateral filter weights has the advantage ofdecreasing the influence of input data values on some spatial positionsof the interpolation results, while amplifying its influence for others.As illustrated in FIG. 6, the weights 618 utilized for up-scaling thearray of input data values 617 adapt to input data using the guidingimage data g 613 which provides additional information to control theup-scaling process. In the up-scaling process, a single input data valueof the array of input data values in(x,y) 617 is multiplied by thekernel w 618 of size (2r+1)×(2r+1) creating an interpolated area ofoutput data out(x±r,y±r) 521 of size (2r+1)×(2r+1). As will beappreciated, however, the interpolation areas of neighbouring inputpositions may overlap. In order to handle such cases, values fromdifferent overlapping interpolation areas located at the spatialposition x, y can be aggregated and normalized by a normalization factorW′(x,y) producing the final output value out(x,y). If the stride S isgreater than 1, the spatial resolution of the output data created by theinterpolation areas will be increased. Mathematically, this can beexpressed in the following way:

${{out}\left( {x,y} \right)} = {\frac{1}{W^{\prime}\left( {x,y} \right)}{\sum\limits_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{w\left( {x^{\prime},y^{\prime},i,j} \right)}}}}$

where:

W′(x,y)=E _({x′,y′}:x′−i=x,y′−j=y) w(x′,y′,i,j),

i∈{r, . . . , r},j∈{r, . . . , r}.

In an embodiment, the bilateral filter weights 618 are defined by thefollowing equation:

${{w\left( {x,y,i,j} \right)} = {e^{- \frac{{({x - i})}^{2} + {({y - j})}^{2}}{2\; w_{r}}}e^{- \frac{d{({{g{({{x - i},{y - j}})}},{g{({x,y})}}})}}{2w_{d}}}}},$

wherein d(⋅,⋅) denotes a distance function. Thus, the bilateral filterweights 618 can take into account the distance of the value within thekernel from the center of the kernel and, additionally, the similarityof the data values with data in the center of the kernel.

FIG. 7 shows a schematic diagram highlighting the main processing stage701 of the data processing apparatus 100 according to an embodiment, forinstance, the data processing apparatus 100 providing the neural network110 shown in FIG. 2. As already described above, in a first processingoperation operation 703 the neural network 110 can generate theplurality of position dependent kernels w_(L)(x,y) 118 on the basis ofthe array of guiding data g(x,y) 113. In a second processing operation705 the neural network 110 can generate the array of output data valuesout (x,y) 121 on the basis of the array of input data values in(x,y) 117and the plurality of position dependent kernels w_(L)(x,y,i,j) 118.

FIG. 8 shows a schematic diagram illustrating the neural network 110provided by the data processing apparatus 100 according to a furtherembodiment. As will be described in more detail in the following, themain difference to the embodiment shown in FIG. 2 is that in theembodiment shown in FIG. 8 the neural network 110 is configured togenerate the plurality of position dependent kernels based on aplurality of position independent kernels 119 b (shown in FIG. 9) and aplurality of position dependent weights F_(f)(x,y) 119 a (also referredto as similarity features 119 a). In an embodiment, the similarityfeatures 119 a could indicate higher-level knowledge about the inputdata, including e.g. semantic segmentation, per-instance objectdetection, data importance indicators like ROI (Region of Interest) andmany others all learned by the neural network 110 itself or being anadditional input to the neural network 110. In an embodiment, the neuralnetwork 110 of FIG. 8 is configured to generate the plurality ofposition dependent kernels 118 by adding the position independentkernels 119 b weighted by the associated position dependent weightsF_(r)(x,y) 119 a.

In an embodiment, the plurality of position independent kernels 119 bcan be predetermined or learned by the neural network 110. Asillustrated in FIG. 8, also in this embodiment the neural network 110can comprise one or more preceding layers 115, which precede the neuralnetwork layer 120 and which can be implemented as an additional neuralnetwork layer or a pre-processing layer. In an embodiment, one or morelayers of the preceding layers 115 are configured to generate theplurality of position dependent weights F_(f)(x,y) 119 a on the basis ofan original array of original input data values. The original array oforiginal input data values of the neural network 110 can comprise thearray of input data values 117 to be processed by the neural networklayer 120 or another array of input data values 111 associated to thearray of input data values 117, for instance, the initial array of inputdata 111.

In the exemplary embodiment shown in FIG. 8, the array of input datavalues in(x,y) 117 and the array of output data values out(x,y) 121 aretwo-dimensional arrays and the neural network layer 120 is configured togenerate a respective kernel of the plurality of position dependentkernels w_(L)(x,y,i,j) 118 on the basis of the following equation:

w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j),

wherein F_(f)(x,y) denotes the set of N_(f) position dependent weights(or similarity features) 119 a and K_(f)(i,j) denotes the plurality ofposition independent kernels 119 b, as also illustrated in FIG. 9.

FIG. 10 shows a schematic diagram highlighting the main processing stage1001 implemented in the data processing apparatus 100 according to anembodiment, for instance, the data processing apparatus 100 providingthe neural network 100 illustrated in FIGS. 8 and 9. As alreadydescribed above, in a first processing operation 1003 the neural network110 can generate the plurality of position dependent weights orsimilarity features F_(f)(x,y) 119 a on the basis of the array ofguiding data g(x,y) 113. In a second processing step 1005 the neuralnetwork 110 can generate the plurality of position dependent kernelsw_(L)(x,y,i,j) 118 on the basis of the plurality of position dependentweights or similarity features F_(f)(x,y) 119 a and the plurality ofposition independent kernels K_(f)(i,j) 119 b. In a further operation(not shown in FIG. 10, but similar to the processing operation 705 shownin FIG. 7) the neural network layer 120 can generate the array of outputdata values out(x,y) 121 on the basis of the array of input data valuesin(x,y) 117 and the plurality of position dependent kernelsw_(L)(x,y,i,j) 118.

In a further embodiment, the neural network layer 120 is configured toprocess the array of input data values 117 on the basis of the pluralityof position dependent kernels 118 using an “inverse” maximum or minimumpooling scheme. In one embodiment, the array of input data values 117and the array of output data values 121 are two-dimensional arrays andthe neural network layer 120 is configured to generate the array ofoutput data values 121 on the basis of the following equations:

${{{out}\left( {x,y} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},\mspace{20mu} {{W_{L}^{\prime}\left( {x,y} \right)} = {\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}},\mspace{20mu} {i \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{{{sel}\left( {x,y,i,j} \right)} = \left\{ \begin{matrix}\begin{matrix}{1,{{w_{L}\left( {x,y,i,j} \right)}\mspace{14mu} {is}\mspace{14mu} \max \mspace{14mu} {or}\mspace{14mu} \min \mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}\mspace{14mu} {w_{L}\left( {x,y,k,l} \right)}},} \\{{k\; \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{l \in \left\{ {{- r},\ldots \mspace{11mu},r} \right\}}}\end{matrix} \\{0,{otherwise}}\end{matrix} \right.}$

wherein x,y,x′,y′i,j,k,l denote array indices, out(x,y) denotes thearray of output data values 121, in(x′,y′) denotes the array of inputdata values 117, r denotes a size of each kernel of the plurality ofposition dependent kernels w_(L)(x,y,i,j) 118, sel(x,y,i,j) denotes aselection function and W_(L)′(x,y) denotes a normalization factor. In anembodiment the normalization factor W_(L)′(x,y) can be set equal to 1.

In this embodiment, the neural network layer 120 can be considered toadaptively guide data from the array of input data values 117 to aspatial position of a sub-array of the array of output data values 121(i.e. the interpolated area) based on the individual position dependentkernel values 118. In this way a sort of more intelligent dataun-pooling can be performed. In an embodiment, the input data valuecorresponding to the spatial position (x,y) is copied to the position(x−i_(max/min),y−j_(max/min)) of the sub-array of output data values(i.e. the interpolated area) of size (2r+1)×(2r+1), where(i_(max/min),j_(max/min)) are the indices of the individual kernelvalues with the largest (max) or slowest (min) value among allindividual kernel values. As can be taken from the equations above, inthis embodiment, other values can be set to zero or, in an alternativeembodiment, remain unset. Additionally, an aggregation of overlappingsub-arrays, i.e. interpolated areas can be performed, as in theembodiments described above.

In another embodiment, the array of input data values 117 and the arrayof output data values 121 are two-dimensional arrays and the neuralnetwork layer 120 is configured to generate the array of output datavalues 121 on the basis of the following equations:

${{{out}\left( {x,y} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}}}},\mspace{20mu} {{W_{L^{\prime}}\left( {x,y} \right)} = {\sum\limits_{{{{\{{x^{\prime},y^{\prime}}\}}:{x^{\prime} - i}} = x},{{y^{\prime} - j} = y}}{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}},\mspace{20mu} {i \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)} = \left\{ \begin{matrix}{1,{{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {is}\mspace{14mu} {maximum}\mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}\mspace{14mu} {w_{L}\left( {x^{''},y^{''},k,l} \right)}},} \\{{{{\left\{ {x^{''},y^{''}} \right\} \text{:}x^{''}} - k} = x},{{y^{''} - l} = y},} \\{{k \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{l \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}}} \\{0,{otherwise}}\end{matrix} \right.}$

wherein x,y,x′,y′,x″,y″,j,k,l denote array indices, out(x,y) denotes thearray of output data values 121, in(x′,y′) denotes the array of inputdata values 117, r denotes a size of each kernel of the plurality ofposition dependent kernels w_(L)(x′,y′,i,j) 118, sel(x,y,x′,y′,i,j)denotes a selection function and W_(L)′(x,y) denotes a normalizationfactor. In an embodiment the normalization factor W_(L)′(x,y) can be setequal to 1.

In this embodiment, the neural network layer 120 can be considered toadaptively select output data out(x,y) from input data guided intoposition (x,y) without performing a weighted average, but selecting asthe output data value out (x,y) the input data value in(x′,y′) of thearray of input data values 117 which corresponds to the maximum orminimum kernel value w_(L)(x′,y′,i,j). As a result, the output iscomputed as the input data value which would originally contribute themost (or in the alternative embodiment the least) to the weightedaverage.

FIG. 11 shows a flow diagram illustrating a data processing method 1100based on a neural network 110 according to an embodiment. The dataprocessing method 1100 can be performed by the data processing apparatus100 shown in FIG. 1 and its different embodiments described above. Thedata processing method 1100 comprises the operation 1101 of generatingby the neural network layer 120 of the neural network 110 from the arrayof input data values 117 the array of output data values 121 based on aplurality of position dependent kernels 118 and a plurality of inputdata values of the array of input data values 117. As will beappreciated, further embodiments of the data processing method 1100result directly from the embodiments of the corresponding dataprocessing apparatus 100 described above. Embodiments of the dataprocessing methods may be implemented and/or performed by one or moreprocessors as described above.

In the following some further details about various aspects andembodiments (aggregation network layer, convolution network layer,correlation network layer and normalization) are provided.

Upscaling

In embodiments the proposed guided aggregation can be applied forfeature map up-scaling (spatial resolution increase). Input values whichare features of the feature map are up-scaled one-by-one formingoverlapping output sub-arrays of values which are than aggregated andoptionally normalized to form output data array. Due to additionalguiding information in form of position dependent kernels, theup-scaling process for each input value can be performed in a controlledway, enabling addition of higher resolution details, e.g. object orregion borders, that was originally not present in the inputlow-resolution representation. Here, guiding data represents informationabout object or region borders in higher resolution, and can be obtainedby e.g. color-based segmentation, semantic segmentation using precedingneural network layers or an edge map of a texture image corresponding toprocessed feature map.

Deconvolution

In embodiments the proposed guided deconvolution can be applied forswitchable feature extraction or mixing. Input values which are featuresof the feature map are deconvolved with adaptable filters which areformed from the input guiding data in form of position dependentkernels. This way, each selected area of the input feature map can beprocessed with filters especially adapted for that area producing andmixing only features desired for these regions. Here, guiding data inform of similarity features represents information about object/regionborders, obtained by e.g. color-based segmentation, semanticsegmentation using preceding neural network layers, an edge map of atexture image corresponding to processed feature map or a ROI (region ofinterest) binary map.

Normalization

In general, normalization is advantageous if the output values obtainedfor different spatial positions are going to be compared to each otherper-value, without any intermediate operation. As a result, preservationof the mean (DC) component is beneficial. If such comparison is notperformed, normalization is not required but increases complexity.Additionally, one can omit normalization in order to simplify thecomputations and compute only an approximate result.

While a particular feature or aspect of the disclosure may have beendisclosed with respect to only one of several implementations orembodiments, such feature or aspect may be combined with one or moreother features or aspects of the other implementations or embodiments asmay be desired and advantageous for any given or particular application.Furthermore, to the extent that the terms “include”, “have”, “with”, orother variants thereof are used in either the detailed description orthe claims, such terms are intended to be inclusive in a manner similarto the term “comprise”. Also, the terms “exemplary”, “for example” and“e.g.” are merely meant as an example, rather than the best or optimal.The terms “coupled” and “connected”, along with derivatives may havebeen used. It should be understood that these terms may have been usedto indicate that two elements cooperate or interact with each otherregardless whether they are in direct physical or electrical contact, orthey are not in direct contact with each other.

Although aspects have been illustrated and described herein, it will beappreciated by those of ordinary skill in the art that a variety ofalternate and/or equivalent implementations may be substituted for theaspects shown and described without departing from the scope of thepresent disclosure. This application is intended to cover anyadaptations or variations of the aspects discussed herein.

Although the elements in the following claims are recited in aparticular sequence with corresponding labeling, unless the claimrecitations otherwise imply a particular sequence for implementing someor all of those elements, those elements are not necessarily intended tobe limited to being implemented in that particular sequence.

Many alternatives, modifications, and variations will be apparent tothose skilled in the art in light of the above teachings. Of course,those skilled in the art readily recognize that there are numerousapplications of the invention beyond those described herein. While theembodiments of the invention have been described with reference to oneor more particular embodiments, those skilled in the art recognize thatmany changes may be made thereto without departing from the scope of theinvention. It is therefore to be understood that within the scope of theappended claims and their equivalents, the embodiments of the inventionmay be practiced otherwise than as described herein.

1. A data processing apparatus comprising: a processor configured to:provide a neural network, wherein the neural network comprises a neuralnetwork layer configured to generate from an array of input data valuesan array of output data values based on a plurality of positiondependent kernels and a plurality of input data values of the array ofinput data values.
 2. The data processing apparatus of claim 1, whereinthe neural network comprises an additional neural network layerconfigured to generate the plurality of position dependent kernels basedon an original array of original input values of the neural network,wherein the original array of original input values of the neuralnetwork comprises the array of input values or another array of inputvalues associated to the array of input data values.
 3. The dataprocessing apparatus of claim 2, wherein the neural network isconfigured to generate the plurality of position dependent kernels basedon a plurality of position independent kernels and a plurality ofposition dependent weights.
 4. The data processing apparatus of claim 3,wherein the neural network is configured to generate a kernel of theplurality of position dependent kernels by adding the positionindependent kernels weighted by the associated position dependentweights.
 5. The data processing apparatus of claim 3, wherein theplurality of position independent kernels are predetermined or learnedand wherein the neural network comprises an additional neural networklayer or processing layer configured to generate the plurality ofposition dependent weights based on an original array of original inputdata values of the neural network, wherein the original array oforiginal input data values of the neural network comprises the array ofinput data values or another array of input data values associated tothe array of input data values.
 6. The data processing apparatus ofclaim 3, wherein the array of input data values and the array of outputdata values are two-dimensional arrays and the neural network layerconfigured to generate a kernel of the plurality of position dependentkernels w_(L)(x,y,i,j) on the basis of the following equation:w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j), whereinF_(f)(x,y) denotes the plurality of N_(f) position dependent weights andK_(f)(i,j) denotes the plurality of position independent kernels.
 7. Thedata processing apparatus of claim 1, wherein the neural network layeris a deconvolutional network layer or an upscaling network layer.
 8. Thedata processing apparatus of claim 1, wherein the array of input datavalues and the array of output data values are two-dimensional arraysand wherein the neural network layer is a deconvolution network layerconfigured to generate the array of output data values on the basis ofthe following equations:${{{out}\left( {x,y,c_{o}} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y,c_{o}} \right)}{\sum\limits_{c_{i} = 1}^{C_{i}}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime},c_{i}} \right)}{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}}}}}},\mspace{20mu} {{W_{L}^{\prime}\left( {x,y,c_{o}} \right)} = {\sum\limits_{c_{i} = 1}^{C_{i}}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{w_{L}\left( {x^{\prime},y^{\prime},c_{o},c_{i},i,j} \right)}\mspace{14mu} {or}}}}}$  W_(L)^(′)(x, y, c_(o)) = 1,   i ∈ {−r, …  , r}, j ∈ {−r, …  , r}.wherein x,y,x′,y′,i,j denote array indices, out(x,y,c_(o)) denotes thearray of output data values having one or more channels, in(x′,y′,c_(i))denotes the array of input data values, r denotes a size of each kernelof the plurality of position dependent kernelsw_(L)(x,y,c_(o),c_(i),i,j) having one or more channels andW_(L)′(x,y,c_(o)) denotes a normalization factor.
 9. The data processingapparatus of claim 1, wherein the array of input data values and thearray of output data values two-dimensional arrays and wherein theneural network layer is an upscaling network layer configured togenerate the array of output data values on the basis of the followingequations:${{{out}\left( {x,y} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},{{W_{L^{\prime}}\left( {x,y} \right)} = {{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {or}\mspace{14mu} {W_{L}^{\prime}\left( {x,y} \right)}}} = 1}},{i \in \left\{ {{- r},\ldots \mspace{11mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},\; r} \right\}},$wherein x,y,x′,y′,i,j denote array indices, out(x,y) denotes the arrayof output data values, in(x′,y′) denotes the array of input data values,r denotes a size of each kernel of the plurality of position dependentkernels w_(L)(x′,y′,i,j) and W_(L)′(x,y) denotes a normalization factor.10. The data processing apparatus of claim 1, wherein the neural networklayer is configured to generate the array of output data values on thebasis of the overlapping interpolation areas, wherein each overlappinginterpolation area is generated on the basis of the input data value ofthe array of input data values and the respective kernel of theplurality of position dependent kernels by assigning to the overlappinginterpolation area the input data value of the array of input datavalues at the position corresponding to the position of the maximum orminimum value of the respective kernel of the plurality of positiondependent kernels and zero otherwise.
 11. The data processing apparatusof claim 1, wherein the array of input data values and the array ofoutput data values are two-dimensional arrays and the neural networklayer is configured to generate the array of output data values on thebasis of the following equations:${{{out}\left( {x,y} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}}}}},\mspace{20mu} {{W_{L^{\prime}}\left( {x,y} \right)} = {{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{sel}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {or}\mspace{20mu} {W_{L}^{\prime}\left( {x,y} \right)}}} = 1}},\mspace{20mu} {i \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{{{sel}\left( {x,y,i,j} \right)} = \left\{ \begin{matrix}{1,{{w_{L}\left( {x,y,i,j} \right)}\mspace{14mu} {is}\mspace{14mu} \max \mspace{14mu} {or}\mspace{14mu} \min \mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}\mspace{14mu} {w_{L}\left( {x,y,k,l} \right)}},} \\{{k \in \left\{ {{- r},\ldots \mspace{14mu},\; r} \right\}},{l \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}}} \\{0,{otherwise}}\end{matrix} \right.}$ wherein x,y,x′,y′,i,j,k,l denote array indices,out(x,y) denotes the array of output data values, in(x′,y′) denotes thearray of input data values, r denotes a size of each kernel of theplurality of position dependent kernels w_(L)(x,y,i,j) (118),sel(x,y,i,j) denotes a selection function and W_(L)′(x,y) denotes anormalization factor.
 12. The data processing apparatus of claim 1,wherein the neural network layer is configured to generate the array ofoutput data values, wherein each value of the array of output datavalues at the overlapping spatial position is generated on the basis ofthe input data values of the array of input data values for which valuesof the respective kernels of the plurality of position dependent kernelsat the overlapping spatial position are the maximum or minimum valueamong all the values of the respective kernels of the plurality ofposition dependent kernels at the overlapping spatial position.
 13. Thedata processing apparatus of claim 1, wherein the array of input datavalues and the array of output data values are two-dimensional arraysand the neural network layer is configured to generate the array ofoutput data values on the basis of the following equations:${{{out}\left( {x,y} \right)} = {\frac{1}{W_{L^{\prime}}\left( {x,y} \right)}{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{in}\left( {x^{\prime},y^{\prime}} \right)}{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}}}},\mspace{20mu} {{W_{L}^{\prime}\left( {x,y} \right)} = {{\sum\limits_{{{{{\{{x^{\prime},y^{\prime}}\}}\text{:}x^{\prime}} - i} = x},{{y^{\prime} - j} = y}}{{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {or}\mspace{20mu} {W_{L}^{\prime}\left( {x,y} \right)}}} = 1}},\mspace{20mu} {i \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{j \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{{{sel}\left( {x,y,x^{\prime},y^{\prime},i,j} \right)} = \left\{ \begin{matrix}{1,{{w_{L}\left( {x^{\prime},y^{\prime},i,j} \right)}\mspace{14mu} {is}\mspace{14mu} {maximum}\mspace{14mu} {weight}\mspace{14mu} {of}\mspace{14mu} {all}\mspace{14mu} {w_{L}\left( {x^{''},y^{''},k,l} \right)}},} \\{{{{\left\{ {x^{''},y^{''}} \right\} \text{:}x^{''}} - k} = x},{{y^{''} - l} = y},} \\{{k \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}},{l \in \left\{ {{- r},\ldots \mspace{14mu},r} \right\}}} \\{0,{otherwise}}\end{matrix} \right.}$ wherein x,y,x′,y′,x″,y″,i,j,k,l denote arrayindices, out (x,y) denotes the array of output data values, in(x′,y′)denotes the array of input data values, r denotes a size of each kernelof the plurality of position dependent kernels w_(L)(x′,y′,i,j) 118,sel(x,y,x′,y′,i,j) denotes a selection function and W_(L)′(x,y) denotesa normalization factor.
 14. A data processing method comprising:generating by a neural network layer of a neural network from an arrayof input data values an array of output data values based on a pluralityof position dependent kernels and a plurality of different input datavalues of the array of input data values.
 15. The method of claim 14,wherein the neural network comprises an additional neural network layerconfigured to generate the plurality of position dependent kernels basedon an original array of original input values of the neural network,wherein the original array of original input values of the neuralnetwork comprises the array of input values or another array of inputvalues associated to the array of input data values.
 16. The method ofclaim 15, wherein the neural network is configured to generate theplurality of position dependent kernels based on a plurality of positionindependent kernels and a plurality of position dependent weights. 17.The method of claim 16, wherein the neural network is configured togenerate a kernel of the plurality of position dependent kernels byadding the position independent kernels weighted by the associatedposition dependent weights.
 18. The method of claim 16, wherein theplurality of position independent kernels are predetermined or learnedand wherein the neural network comprises an additional neural networklayer or processing layer configured to generate the plurality ofposition dependent weights based on an original array of original inputdata values of the neural network, wherein the original array oforiginal input data values of the neural network comprises the array ofinput data values or another array of input data values associated tothe array of input data values.
 19. The method of claim 16, wherein thearray of input data values and the array of output data values aretwo-dimensional arrays and the neural network layer is configured togenerate a kernel of the plurality of position dependent kernelsw_(L)(x,y,i,j) on the basis of the following equation:w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j), whereinF_(f)(x,y) denotes the plurality of N_(f) position dependent weights andK_(f)(i,j) denotes the plurality of position independent kernels.
 20. Anon-transitory computer-readable medium comprising program code storedtherein, which when executed by a processor, causes the processor toperform operations comprising: generating by a neural network layer of aneural network from an array of input data values an array of outputdata values based on a plurality of position dependent kernels and aplurality of different input data values of the array of input datavalues.
 21. The computer-readable medium of claim 20, wherein the neuralnetwork comprises an additional neural network layer configured togenerate the plurality of position dependent kernels based on anoriginal array of original input values of the neural network whereinthe original array of original input values of the neural networkcomprises the array of input values or another array of input valuesassociated to the array of input data values.
 22. The computer-readablemedium of claim 21, wherein the neural network is configured to generatethe plurality of position dependent kernels based on a plurality ofposition independent kernels and a plurality of position dependentweights.
 23. The computer readable medium of claim 22, wherein theneural network is configured to generate a kernel of the plurality ofposition dependent kernels by adding the position independent kernelsweighted by the associated position dependent weights.
 24. Thecomputer-readable medium of claim 22, wherein the plurality of positionindependent kernels are predetermined or learned and wherein the neuralnetwork comprises an additional neural network layer or processing layerconfigured to generate the plurality of position dependent weights basedon an original array of original input data values of the neural networkcomprises the array of input data values or another array of input datavalues associated to the array of input data values.
 25. Thecomputer-readable medium of claim 22, wherein the array of input datavalues and the array of output values are two dimensional arrays and theneural network layer is configured to generate a kernel of the pluralityof position dependent kernels w_(L)(x,y,i,j) on the basis of thefollowing equation:w _(L)(x,y,i,j)=Σ_(f=1) ^(N) ^(f) F _(f)(x,y)·K _(f)(i,j), whereinF_(f)(x,y) denotes the plurality of N_(f) position dependent weights andK_(f)(i,j) denotes the plurality of position independent kernels.